A Dependency Treebank for Telugu
نویسندگان
چکیده
In this paper, we describe the annotation and development of Telugu treebank following the Universal Dependencies framework. We manually annotated 1328 sentences from a Telugu grammar textbook and the treebank is freely available from Universal Dependencies version 2.1.1 In this paper, we discuss some language specific annotation issues and decisions; and report preliminary experiments with POS tagging and dependency parsing. To the best of our knowledge, this is the first freely accessible and open dependency treebank for Telugu.
منابع مشابه
Bidirectional Dependency Parser for Hindi, Telugu and Bangla
This paper describes the dependency parser we used in the NLP Tools Contest, 2009 for parsing Hindi, Bangla and Telugu. The parser uses a bidirectional parsing algorithm with two operations proj and non-proj to build the dependency tree. The parser obtained Labeled Attachment Score of 71.63%, 59.86% and 67.74% for Hindi, Telugu and Bangla respectively on the treebank with fine-grained dependenc...
متن کاملComparative Error Analysis of Parser Outputs on Telugu Dependency Treebank
We present a comparative error analysis of two parsers MALT and MST on Telugu Dependency Treebank data. MALT and MST are currently two of the most dominant data-driven dependency parsers. We discuss the performances of both the parsers in relation to Telugu language. we also talk in detail about both the algorithmic issues of the parsers as well as the language specific constraints of Telugu.Th...
متن کاملIssues in Analyzing Telugu Sentences towards Building a Telugu Treebank
This paper describes an effort towards building a Telugu Dependency Treebank. We discuss the basic framework and issues we encountered while annotating. 1487 sentences have been annotated in Paninian framework. We also discuss how some of the annotation decisions would effect the development of a parser for Telugu.
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملExternal Sandhi and its Relevance to Syntactic Treebanking
External sandhi is a linguistic phenomenon which refers to a set of sound changes that occur at word boundaries. These changes are similar to phonological processes such as assimilation and fusion when they apply at the level of prosody, such as in connected speech. External sandhi formation can be orthographically reflected in some languages. External sandhi formation in such languages, causes...
متن کامل